This page last changed on Jan 18, 2009 by straha1.
Tutorial

This page is part of a tutorial that explains how to

Now that you've learned to compile programs and submit jobs, you need to know how to monitor and delete them. (Make sure you read the QDel section of this page.) The PBS queuing system includes a number of programs for examining the PBS queue and monitoring or controlling your jobs. This page discusses the following topics:

For information on submitting jobs using QSub, see the second part of this tutorial or our page on QSub: Using QSub. All three of these commands have manual pages which can be accessed through the UNIX man program:

man qstat
man qdel
man qsub

For detailed information on QSub, QStat and QDel, see Running Jobs on HPC.

QDel: Canceling a Job

Occasionally you might realize you messed up an input parameter, typed the wrong executable name or made some other mistake. Rather than letting your incorrectly-configured job run, you can cancel it using the qdel command:

qdel 3172.hpc.cl.rs.umbc.edu

where "3172.hpc.cl.rs.umbc.edu" is the job number returned from qsub. (If you forgot your job number, you can use qstat to determine what it is.) QDel can even cancel your job after it has started running. It may take a minute or two for your job to be deleted from the queue. You can use qstat to monitor the progress of the deletion.

QStat: Job Status Information

Examining Your Jobs

Your job might be sitting in the queue for a while before it runs, depending on how many people are using the cluster. You can check the status of your job using qstat:

qstat 3172.hpc.cl.rs.umbc.edu

where 3172.hpc.cl.rs.umbc.edu should be replaced by whatever job number qsub returned. If your job is in the queue or running, that command should print out a message much like this:

Job id              Name             User            Time Use S Queue
------------------- ---------------- --------------- -------- - -----
3172.hpc            hello_parallel   straha1                0 R low_priority

where straha1 is replaced by your user name. The R indicates that your job is running. If you see a Q there, then your job is in the queue waiting to run. If qstat gives you this message:

qstat: Unknown Job Id 3172.hpc.cl.rs.umbc.edu

then your job has either aborted, completed normally or been deleted. You can get much more detailed information about your job using the -f option to qsub:

qstat -f 3172.hpc.cl.rs.umbc.edu

which will print out extensive information, including the number of nodes used, the number of processors per node, which nodes were allocated, the queue, and much more.

Examining the PBS Queue

You can see the list of all jobs in the queue by simply typing qstat (without any job number or options) which might produce something like this:

Job id              Name             User            Time Use S Queue
------------------- ---------------- --------------- -------- - -----
3166.hpc            MPI_DG           gobbert         00:13:18 R low_priority
3167.hpc            MPI_DG           gobbert         00:41:01 R low_priority
3168.hpc            MPI_DG           gobbert         01:33:29 R low_priority
3171.hpc            llcbench         straha1         00:13:24 R low_priority
3172.hpc            hello_parallel   straha1                0 Q low_priority

You can see details about other peoples' jobs using the same qstat -f command described in the previous section. If you notice that the cluster is especially busy right now, you may wish to wait before trying to debug a new MPI program, otherwise you might be waiting an hour or more every time you start the program.

Document generated by Confluence on Mar 31, 2011 15:37